The Optimal Quantile Estimator for Compressed Counting 



Ping Li (pingli@comell.edu) 
Cornell University, Ithaca, NY 14853 



Abstract 

Compressed Counting (CCfl was recently proposed for very efficiently computing the (approximate) ath 
frequency moments of data streams, where < a < 2. Several estimators were reported including the geometric 
mean estimator, the harmonic mean estimator, the optimal power estimator, etc. The geometric mean estimator 
is particularly interesting for theoretical purposes. For example, when q 1, the complexity of CC (using 
the geometric mean estimator) is O (1/e), breaking the well-known large-deviation bound O (l/e^). The case 
a ~ 1 has important applications, for example, computing entropy of data streams. 

For practical purposes, this study proposes the optimal quantile estimator. Compared with previous estima- 
tors, this estimator is computationally more efficient and is also more accurate when a > 1. 

1 Introduction 

Compressed Counting (CC) |l4l|7] was very recently proposed for efficiently computing the ath frequency mo- 
ments, where < a < 2, in data streams. The underlying technique of CC is maximally skewed stable 
random projections, which significantly improves the well-know algorithm based on symmetric stable random 
projections^^, especially when a — » 1. CC boils down to a statistical estimation problem and various esti- 
mators have been proposed||4l|2l. In this study, we present an estimator based on the optimal quantiles, which is 
computationally more efficient and significantly more accurate when a > 1, as long as the sample size is not too 
small. 

One dkect application of CC is to estimate entropy of data streams. A recent trend is to approximate entropy 
using frequency moments and estimate frequency moments using symmetric stable random proiections fH] l2l. 
im applied CC to estimate entropy and demonstrated huge improvement (e.g., 50-fold) over previous studies. 

CC was recently presented at MMDS 2008: Workshop on Algorithms for Modern Massive Data Sets. Slides 
are available at |http : //www. Stanford. edu/group/inmds/slides2008/li .pdf| 

1.1 The Relaxed Strict Turnstile Data Stream Model 

Compressed Counting (CC) assumes a relaxed strict Turnstile data stream model. In the Turnstile model||9|, the 
input stream at = {it, It), it G [1, D] arriving sequentially describes the underlying signal A, meaning 

At[it] ^ At^i[it] + It, (1) 

where the increment It can be either positive (insertion) or negative (deletion). Restricting At [i] > at all t results 
in the strict Turnstile model, which suffices for describing most natural phenomena. CC constrains At [i] > only 
at the t we care about; however, when at s t, CC allows Ag [i] to be arbitrary. 

Under the relaxed strict Turnstile model, the ath frequency moment of a data stream At is defined as 

D 

^(")-E^*w"- ^2) 

i=l 

When a = 1, it is obvious that one can compute = X^iLi ^tM = Ss=i trivially, using a simple counter. 
When a 7^ 1, however, computing exactly requires D counters. 

'The results were initially drafted in Jan 2008, as part of a report for private communications with several theorists. That report was later 
filed to arXivQ, which, for shortening the presentation, excluded the content of the optimal quantile estimator. 
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1.2 Maximally-skewed Stable Random Projections 



Based on maximally skewed stable random projections), CC provides an very efficient mechanism for approx- 
imating F(c()- One first generates a random matrix R e R^, whose entries are i.i.d. samples of a /3-skewed 
Qf-stable distribution with scale parameter 1, denoted by 7-y S{a, (3, 1). 

By property of stable distributions lTSl [TOl . entries of the resultant projected vector X — 'RjAt G M*^ are i.i.d. 
samples of a /3-skewed a-stable distribution whose scale parameter is the a frequency moment of At we are after: 

D / D 

i=l \ i=l 

The skewness parameter [3 G [—1, 1]. CC recommends (3—1, i.e., maximally-skewed, for the best perfor- 
mance. 

In real implementation, the linear projection X = 'RjAt is conducted incrementally, using the fact that the 
Turnstile model is also linear That is, for every incoming at = {it, It), we update Xj <~ xj + ri^jlt for j = 1 to 
k. This procedure is similar to that of symmetric stable random projections^^; the difference is the distribution 
of the elements in R. 



2 The Statistical Estimation Problem and Previous Estimators 

CC boils down to a statistical estimation problem. Given k i.i.d. samples, Xj ~ S (a, (3 = 1, F{a)), estimate the 
scale parameter F(^a) ■ 

Assume k i.i.d. samples Xj ^ S (a, /3 = 1, ^(a))- Various estimators were proposed in lUlTl, including the 
geometric mean estimator, the harmonic mean estimator, the maximum likelihood estimator, the optimal quantile 
estimator Figure [1] compares their asymptotic variances along with the asymptotic variance of the geometric 
mean estimator for symmetric stable random projections\6\. 
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Figure 1 : Let F be an estimator of F with asymptotic variance Var (^F^ = ^"IT ^ ^ (p")' P^^*- ^ values 
for the geometric mean estimator, the harmonic mean estimator (for a < 1), the optimal power estimator (the 
lower dashed curve), and the optimal quantile estimator, along with the V values for the geometric mean estimator 
for symmetric stable random projections in [|6l ("symmetric GM", the upper dashed curve). When a 1, CC 
achieves an "infinite improvement" in terms of the asymptotic variances. 
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2.1 The geometric mean estimator, F(^a),grM for < a < 2, (a 7^ 1) 



F{a),gm 



H l"^) / ic^)) r (1 - i) r (I) 

2 / 1 \ 

Var = + 2 - 3K'(a)) + O (^^ j , 

K(a) = a, if a < 1, K{a) = 2 — a, if a > 1. 
P{a),gm is Unbiased and has exponential tail bounds for all < a < 2. 

2.1.1 The harmonic estimator, F(^a),hm.,cy for < a < 1 



cost 



-Ti-^l ^ — ; i — — „ , . z ^ 1 



F{a),hm,c has exponential tail bounds. 

2.2 The maximum likelihood estimator, ^(o.5),mie,c5 for a = 0.5 only 



E (F(o.5),™,e,c) = F(0.5) + O (^) , Var (F(o.5),™;e,c) = + + ^ (p) • 

F(o.5),mZe,c has exponential tail bounds. 

2.3 The optimal power estimator, -F(q),op,c5 for < a < 2, (a 1) 



V'^felS}fr(l-A*)r(Va)sin(|A*a); 

/ 11/1 \ / cos(K(a)A*7r)fr(l-2A*)r(2A*a)sin(7rA*a) \\ 
\ k2X*\X* y l^[cos(K(a)^)fr(l-A*)r(A*a)sin(fA*a)]^ ))' 

^ _ 2 1 cos(K(a)A*7r) ^r(l - 2A*)r(2A*a)sin(7rA*a) \ /l\ 
Var = F(„)^ 1^ _ ;,.)r(A*a) sin (|A*a)] ' " 'J ^ ^ l^j " 

1 / cos(K(a)A7r)^r(l-2A)r(2Aa)sin(7rAa) \ 

A* = argmin g A; a), g(\-a) = —\ — '-^ '-^ '- ^ V " 1 ■ 

V [cos {n{a)^) f r(l - A)r(Aa) sin (f Aa)] j 

When < a < 1, A* < and F(^ce),op,c has exponential tail bounds. 

F(a),op.c becomes the harmonic mean estimator when a = 0+, the arithmetic mean estimator when a = 2, 
and the maximum likelihood estimator when a = 0.5. 
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3 The Optimal Quantile Estimator 



Because X S (a, P — 1, Fi^a)) belongs to the location-scale family (location is zero always), one can estimate 
the scale parameter simply from the sample qantiles. 

3.1 A General Quantile Estimator 

Assume Xj ^ S (a, 1, ^"(0;)), j = 1 to fc. One possibility is to use the q-quantile of the absolute values, i.e., 

g-Quantilejlxj I, j = l,2,...,fc}\" 



where 

= (?-Quantile{|S'(a,/3 = 1,1)1}. (4) 

Denote Z = \X\, where X ~ S" (a, 1, Note that when a < 1, Z = X. Denote the probability 

density function of Z by fz {z; a, ^"(0)), the probability cumulative function by Fz {z; a, ^"(q)), and the inverse 
cumulative function by F^^ (g; a, -F(q)). 

We can analyze the asymptotic (as k 00) variance of ^(q).^, presented in Lemma[Tl 



Lemma 1 



^ fc/|(F^-i(<Z;a,l);a,l)(F^-i(g;a,l))^ ' 



Proof: The proof directly follows from known statistical results on sample quantiles, e.g., /|7] Theorem 9.2], and 
the "delta" method. 



Var J- , + O ( ^ 



^/|(F^i(g;a,l);a,l)(f^ifea,l))' \k\ 
using the fact that 

=F(i/)"F^i(g;a,l), fz{z;a,F^^~,)^F-J^/"fz[za-'^";a,l) . 
We can choose o = o* to minimize the asymptotic variance factor, -—, — — r^, which is 

^ ^ ^ ^ /|(F-i(g;a,l);c«,l)(F-i(9;a,l))"' 

apparently a convex function of q, although there appears no simple algebraic method to prove it (except when 
a = 0+). 

We denote the optimal quantile estimator as F(^a).oq = ^(Q),g*- 

3.2 The Optimal Quantiles 

The optimal quantiles, denoted by q* ~ q* (a), has to be determined by numerical procedures, using the simulated 
probability density functions for stable distributions. We used the fBasics package in R. We, however, found those 
functions had numerical problems when 1 < a < 1.011 and 0.989 < a < 1. 

For all other estimators, we have not noticed any numerical issues even when a = 1 — 10^* or 1 + 10^^. 
Therefore, we do not consider there is any numerical instability for CC, as far as the method itself is concerned. 

Table [U presents the numerical results, including q*, Wq* = (7*-Quantile{|S'(a, f3 = 1, 1)|}, and the variance 
of Fi^a),oq (without the ^ term). The variance factor is also plotted in FigurelT] indicating significant improvement 
over the geometric mean estimator when a > 1. 
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3.3 Comments on the Optimal Quantile Estimator 

The optimal quantile estimator has at least two advantages: 

• When the sample size k is not too small (e.g., k > 50), F(^a),oq is more accurate then F(^a),gm7 especially 
for a > 1. 

• P{a),oq is computationally more efficient. 
The disadvantages are: 

• For small samples (e.g., k < 20), F(^a),oq exhibits bad behaviors when a > 1. 

• Its theoretical analysis, e.g., variances and tail bounds, is based on the density function of skewed stable 
distributions, which do not have closed-forms. The tail bound bounds can be obtained similarly using the 
method developed in |5|. 

• The important parameters, q* and W^., are obtained from the numerically-computed density functions. 
Due to the numerical difficulty in those functions, we can only obtain q* and Wq' values for a > 1.011 and 
a < 0.989. 

4 Conclusion 

Compressed Counting (CC) dramatically improves symmetric stable random projections, especially when a « 1, 
and has important applications in data streams computations such as entropy estimation. 

CC boils down to a statistical estimation problem. We propose the optimal quantile estimator, which con- 
siderably improves the previously proposed geometric mean estimator when a > 1, at least asymptotically. For 
practical purposes, this estimator should be very useful. However, for theoretical purposes, it can not replace the 
geometric mean estimator. 
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